Sequence-to-point learning with neural networks for nonintrusive load monitoring
نویسندگان
چکیده
Energy disaggregation (a.k.a nonintrusive load monitoring, NILM), a single-channel blind source separation problem, aims to decompose the mains which records the whole house electricity consumption into appliance-wise readings. This problem is difficult because it is inherently unidentifiable. Recent approaches have shown that the identifiability problem could be reduced by introducing domain knowledge into the model. Deep neural networks have been shown to be a promising approach for these problems, but sliding windows are necessary to handle the long sequences which arise in signal processing problems, which raises issues about how to combine predictions from different sliding windows. In this paper, we propose sequence-to-point learning, where the input is a window of the mains and the output is a single point of the target appliance. We use convolutional neural networks to train the model. Interestingly, we systematically show that the convolutional neural networks can inherently learn the signatures of the target appliances, which are automatically added into the model to reduce the identifiability problem. We applied the proposed neural network approaches to real-world household energy data, and show that the methods achieve state-of-the-art performance, improving two standard error measures by 84% and 92%. Energy disaggregation (Hart 1992) is a single-channel blind source separation (BSS) problem that aims to decompose the whole energy consumption of a dwelling into the energy usage of individual appliances. The purpose is to help households reduce their energy consumption by helping them to understand what is causing them to use energy, and it has been shown that disaggregated information can help householders reduce energy consumption by as much as 5− 15% (Fischer 2008). However, current electricity meters can only report the whole-home consumption data. This triggers the demand of machine-learning tools to infer the appliancespecific consumption. Energy disaggregation is unidentifiable and thus a difficult prediction problem because it is a single-channel BSS problem; we want to extract more than one source from a single observation. Additionally, there are a large number of sources of uncertainty in the prediction problem, including noise in the data, lack of knowledge of the true power ∗equal contribution Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. usage for every appliance in a given household, multiple devices exhibiting similar power consumption, and simultaneous switching on/off of multiple devices. Therefore energy disaggregation has been an active area for the application of artificial intelligence and machine learning techniques. Popular approaches have been based on factorial hidden Markov models (FHMM) (Kolter and Jaakkola 2012; Parson et al. 2012; Zhong, Goddard, and Sutton 2013; 2014; 2015; Lange and Bergés 2016) and signal processing methods (Pattem 2012; Zhao, Stankovic, and Stankovic 2015; 2016; Batra, Singh, and Whitehouse 2016; Tabatabaei, Dick, and Xu 2017). Recently, it has been shown that single-channel BSS could be modelled by using sequence-to-sequence (seq2seq) learning with neural networks (Grais, Sen, and Erdogan 2014; Huang et al. 2014; Du et al. 2016). In particular, it has been applied to energy disaggregation (Kelly and Knottenbelt 2015a) —both convolutional (CNN) and recurrent neural networks (RNN) were employed. The idea of sequenceto-sequence learning is to train a deep network to map between an input sequence, such as the mains power readings in the NILM problem, and an output sequence, such as the power readings of a single appliance. A difficulty immediately arises when applying seq2seq in signal processing applications such as BSS. In these applications, the input and output sequences can be long, for example, in one of our data sets, the input and output sequences are 14,400 time steps. Such long sequences can make training both computationally difficult, both because of memory limitations in current graphics processing units (GPUs) and, with RNNs, because of the vanishing gradient problem. A common way to avoid these problems is a sliding window approach, that is, training the network to map a window of the input signal to the corresponding window of the output signal. However, this approach has several difficulties, in that each element of the output signal is predicted many times, once for each sliding window; an average of multiple predictions is naturally used, which consequently smooths the edges. Further, we expect that some of the sliding windows will provide a better prediction of a single element than others, particularly those windows where the element is near the midpoint of the window rather than the edges, so that the network can make use of all nearby regions of the input signal, past and future. But a simple sliding window approach cannot exploit this information. In this paper, we propose a different architecture called sequence-to-point learning (seq2point) for single-channel BSS. This uses a sliding window approach, but given a window of the input sequence, the network is trained to predict the output signal only at the midpoint of the window. This has the effect of making the prediction problem easier on the network: rather than needing to predict in total W (T −W ) outputs as in the seq2seq method, where T is the length of the input signal and W the size of the sliding window, the seq2point method predicts only T outputs. This allows the neural network to focus its representational power on the midpoint of the window, rather than on the more difficult outputs on the edges, hopefully yielding more accurate predictions. We provide both an analytical and empirical analysis of the methods, showing that seq2point has a tighter approximation to the target distribution than seq2seq learning. On two different real-world NILM data sets, UK-DALE (Kelly and Knottenbelt 2015b) and REDD (Kolter and Johnson 2011), we find that sequence-to-point learning performs dramatically better than previous work, with as much as 83% reduction in error. Finally, to have confidence in the models, it is vital to interpret the model predictions and understand what information the neural networks for NILM are relying on to make their predictions. By visualizing the feature maps learned by our networks, we found that our networks automatically extract useful features of the input signal, such as change points, and typical usage durations and power levels of appliances. Interestingly, these signatures have been commonly incorporated into handcrafted features and architectures for the NILM problem (Kolter and Jaakkola 2012; Parson et al. 2012; Pattem 2012; Zhao, Stankovic, and Stankovic 2015; Zhong, Goddard, and Sutton 2014; 2015; Batra, Singh, and Whitehouse 2016; Tabatabaei, Dick, and Xu 2017), but in our work these features are learned automatically. Energy disaggregation The goal of energy disaggregation is to recover the energy consumption of individual appliances from the mains signal, which measures the total electricity consumption. Suppose we have observed the mains Y which indicates the total power in Watts in a household, where Y = (y1, y2, ..., yT ) and yt ∈ R+. Typically there are a number of appliances in the same house. For each appliance, its reading is denoted by Xi = (xi1, xi2, ..., xiT ), where xit ∈ R+. At each time step, yt is assumed to be the sum of the readings of individual appliances, possibly plus a Gaussian noise factor with zero mean and variance σ such that yt = ∑ i xit + t. Often we are only interested in I appliances, i.e., the ones that use the most energy; others will be regarded as an unknown factor U = (u1, · · · , uT ). The model could then be represented as yt = ∑I i=1 xit + ut + t. The additive factorial hidden Markov model (AFHMM) is a natural approach to represent this model (Kolter and Jaakkola 2012; Pattem 2012; Zhong, Goddard, and Sutton 2014). Various inference algorithms could then be employed to infer the appliance signals {Xi} (Kolter and Jaakkola 2012; Zhong, Goddard, and Sutton 2014; Shaloudegi et al. 2016). It is well known that the problem is still unidentifiable. To tackle the identifiability problem, various approaches have been proposed by incorporating domain knowledge into the model. For example, local information, e.g., appliance power levels, ON-OFF state changes, and durations, has been incorporated into the model (Kolter and Jaakkola 2012; Parson et al. 2012; Pattem 2012; Zhao, Stankovic, and Stankovic 2015; Tabatabaei, Dick, and Xu 2017); others have incorporated global information, e.g., total number of cycles and total energy consumption (Zhong, Goddard, and Sutton 2014; 2015; Batra, Singh, and Whitehouse 2016). However, the domain knowledge required by these methods needs to be extracted manually, such as via handcrafted features based on the observation data, which makes the methods more difficult to use. Instead, we propose to employ neural networks to extract those features automatically during learning. Sequence-to-sequence learning Kelly and Knottenbelt [2015a] have applied deep learning methods to NILM. They propose several different architectures, which learn a nonlinear regression between a sequence of the mains readings and a sequence of appliance readings with the same time stamps. We will refer to this as a sequence-to-sequence approach. Although RNN architectures are most commonly used in sequence-to-sequence learning for text (Sutskever, Vinyals, and Le 2014), for NILM Kelly and Knottenbelt [2015a] employ both CNNs and RNNs. Similar sequence-to-sequence neural network approaches have been applied to single-channel BSS problems in audio and speech (Grais, Sen, and Erdogan 2014; Huang et al. 2014; Du et al. 2016). Sequence-to-sequence architectures define a neural network Fs that maps sliding windows Yt:t+W−1 of the input mains power to corresponding windows Xt:t+W−1 of the output appliance power, that is, they model Xt:t+W−1 = Fs(Yt:t+W−1)+ , where isW -dimensional Gaussian random noise. Then, to train the network on a pair (X,Y ) of full sequences, the loss function is
منابع مشابه
Application of ANN Technique for Interconnected Power System Load Frequency Control (RESEARCH NOTE)
This paper describes an application of Artificial Neural Networks (ANN) to Load Frequency Control (LFC) of nonlinear power systems. Power systems, such as other industrial processes, have parametric uncertainties that for controller design had to take the uncertainties in to account. For this reason, in the design of LFC controller the idea of robust control theories are being used. To improve ...
متن کاملUnsupervised Learning Algorithm using multiple Electrical Low and High Frequency Features for the task of Load Disaggregation
Device specific power consumption information leads to a high potential for energy savings. Smart meters are currently deployed in several countries, but they are only able to track the overall consumption in domestic and commercial buildings. One promising option to gain device specific information is called Nonintrusive Load Monitoring (NILM), which can be of great use in combination with sma...
متن کاملA New Method for Detecting Ships in Low Size and Low Contrast Marine Images: Using Deep Stacked Extreme Learning Machines
Detecting ships in marine images is an essential problem in maritime surveillance systems. Although several types of deep neural networks have almost ubiquitously used for this purpose, but the performance of such networks greatly drops when they are exposed to low size and low contrast images which have been captured by passive monitoring systems. On the other hand factors such as sea waves, c...
متن کاملNonintrusive Load Monitoring Based on Advanced Deep Learning and Novel Signature
Monitoring electricity consumption in the home is an important way to help reduce energy usage. Nonintrusive Load Monitoring (NILM) is existing technique which helps us monitor electricity consumption effectively and costly. NILM is a promising approach to obtain estimates of the electrical power consumption of individual appliances from aggregate measurements of voltage and/or current in the d...
متن کاملCrop Land Change Monitoring Based on Deep Learning Algorithm Using Multi-temporal Hyperspectral Images
Change detection is done with the purpose of analyzing two or more images of a region that has been obtained at different times which is Generally one of the most important applications of satellite imagery is urban development, environmental inspection, agricultural monitoring, hazard assessment, and natural disaster. The purpose of using deep learning algorithms, in particular, convolutional ...
متن کاملNon-intrusive Load Monitoring Using Imaging Time Series and Convolutional Neural Networks
In recent years, more than 50 million advanced (smart) metering infrastructure units have been installed by the U.S electric utilities. Although, smart metering can provide hourly or sub-hourly customer load, it has failed to directly benefit and provide actionable information to consumers and engage them in energy savings. Using nonintrusive load monitoring techniques, the smart metering data ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1612.09106 شماره
صفحات -
تاریخ انتشار 2016